{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# FINN - End-to-End Flow\n",
    "-----------------------------------------------------------------\n",
    "\n",
    "In this notebook, we will show how to take a simple, binarized, fully-connected network trained on the MNIST data set and take it all the way down to a customized bitfile running on a PYNQ board. \n",
    "\n",
    "This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](finn-design-flow-example.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The white fields show the state of the network representation in the respective step. The colored fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 sections represented by a different color, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (green section), followed by the preparation of the network (blue section) for the Vitis HLS synthesis and Vivado IPI stitching (orange section), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (yellow section).\n",
    "There is an additional section for functional verification (red section) on the right side of the diagram, which we will not cover in this notebook. For details please take a look in the verification notebook which you can find [here](tfc_end2end_verification.ipynb)\n",
    "\n",
    "\n",
    "This Jupyter notebook is organized based on the sections described above. We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.util.visualization import showSrc, showInNetron\n",
    "from finn.util.basic import make_build_dir\n",
    "import os\n",
    "    \n",
    "build_dir = os.environ[\"FINN_BUILD_DIR\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Outline\n",
    "-------------\n",
    "1. [Brevitas export](#brev_exp)\n",
    "2. [Network preparation](#nw_prep)\n",
    "3. [Hardware build](#vivado)\n",
    "4. [PYNQ deployment](#hw_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Brevitas export <a id='brev_exp'></a>\n",
    "FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a PyTorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). To show the FINN end-to-end flow, we'll use the TFC-w1a1 model as example network.\n",
    "\n",
    "First a few things have to be imported. Then the model can be loaded with the pretrained weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import onnx\n",
    "from finn.util.test import get_test_model_trained\n",
    "import brevitas.onnx as bo\n",
    "\n",
    "tfc = get_test_model_trained(\"TFC\", 1, 1)\n",
    "bo.export_finn_onnx(tfc, (1, 1, 28, 28), build_dir+\"/tfc_w1_a1.onnx\"); # semicolon added to suppress log"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model was now exported, loaded with the pretrained weights and saved under the name \"tfc_w1_a1.onnx\".\n",
    "To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "showInNetron(build_dir+\"/tfc_w1_a1.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the model in .onnx format, we can work with it using FINN. For that, `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. 'ModelWrapper' is imported from the [QONNX repo](https://github.com/fastmachinelearning/qonnx), this repository contains several functionality that is used in FINN."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qonnx.core.modelwrapper import ModelWrapper\n",
    "model = ModelWrapper(build_dir+\"/tfc_w1_a1.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the model is prepared and could be simulated using Python. How this works is described in the Jupyter notebook about verification and can be found [here](tfc_end2end_verification.ipynb#simpy).\n",
    "\n",
    "The model can now also be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. A transformation pass changes the model and returns the changed model back to the FINN flow.\n",
    "\n",
    "Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Network preparation <a id='nw_prep'></a>\n",
    "\n",
    "* [FINN-style Dataflow Architectures](#dataflow_arch)\n",
    "* [Tidy-up transformations](#basic_trafo)\n",
    "* [Streamlining](#streamline)\n",
    "* [Conversion to HLS layers](#hls_layers)\n",
    "* [Creating a Dataflow Partition](#dataflow_partition)\n",
    "* [Folding and Datawidth Converter, FIFO and TLastMarker Insertion](#folding)\n",
    "\n",
    "\n",
    "In this section, we will put the network through a series of transformations that puts it in a form that can be stitched together to form a FINN-style dataflow architecture, yielding a high-performance, high-efficiency FPGA accelerator."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FINN-style Dataflow Architectures <a id='dataflow_arch'></a>\n",
    "\n",
    "We start with a quick recap of FINN-style dataflow architectures. The key idea in such architectures is to parallelize across layers as well as within layers by dedicating a proportionate amount of compute resources to each layer, as illustrated in the figure below taken from the [FINN-R paper](https://arxiv.org/pdf/1809.04570.pdf):\n",
    "\n",
    "![](finn-hw-arch.png)\n",
    "\n",
    "In practice, the compute arrays are instantiated by function calls to optimized Vitis HLS building blocks from the [finn-hlslib](https://github.com/Xilinx/finn-hlslib) library. As these function calls can only handle certain patterns/cases, we need to transform the network into an appropriate form so that we can replace network layers with these function calls, which is the goal of the network preparation process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tidy-up transformations <a id='basic_trafo'></a>\n",
    "This section deals with some basic transformations, which are applied to the model like a kind of \"tidy-up\" to make it easier to be processed. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These transformations are:\n",
    "* GiveUniqueNodeNames\n",
    "* GiveReadableTensorNames\n",
    "* InferShapes\n",
    "* InferDataTypes\n",
    "* FoldConstants\n",
    "* RemoveStaticGraphInputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes in the graph are first given unique (by enumeration) names, then the tensors are given human-readable names (based on the node names). The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. These transformations can almost always be applied without negative effects and do not affect the structure of the graph, ensuring that all the information needed is available.\n",
    "\n",
    "The next listed transformation is `FoldConstants`, which performs constant folding. It identifies a node with constant inputs and determines its output. The result is then set as constant-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model. And finally, we have `RemoveStaticGraphInputs` to remove any top-level graph inputs that already have ONNX initializers associated with them. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These transformations can be imported and applied as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs\n",
    "from qonnx.transformation.infer_shapes import InferShapes\n",
    "from qonnx.transformation.infer_datatypes import InferDataTypes\n",
    "from qonnx.transformation.fold_constants import FoldConstants\n",
    "\n",
    "model = model.transform(InferShapes())\n",
    "model = model.transform(FoldConstants())\n",
    "model = model.transform(GiveUniqueNodeNames())\n",
    "model = model.transform(GiveReadableTensorNames())\n",
    "model = model.transform(InferDataTypes())\n",
    "model = model.transform(RemoveStaticGraphInputs())\n",
    "\n",
    "model.save(build_dir+\"/tfc_w1_a1_tidy.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is \"Reshape\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "showInNetron(build_dir+\"/tfc_w1_a1_tidy.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding Pre- and Postprocessing <a id='prepost'></a>\n",
    "\n",
    "In many cases, it's common to apply some preprocessing to the raw data in a machine learning framework prior to training. For image classification networks, this may include conversion of raw 8-bit RGB values into floating point values between 0 and 1. Similarly, at the output of the network some postprocessing may be performed during deployment, such as extracting the indices of the classifications with the largest value (top-K indices).\n",
    "\n",
    "In FINN, we can bake some of these pre/postprocessing operatings into the graph, and in some cases these can be highly beneficial for performance by allowing our accelerator to directly consume raw data instead of going through CPU preprocessing. \n",
    "\n",
    "We'll demonstrate this for our small image classification network as follows. Brevitas preprocesses BNN-PYNQ network inputs with `torchvision.transforms.ToTensor()` [prior to training](https://github.com/Xilinx/brevitas/blob/master/src/brevitas_examples/bnn_pynq/trainer.py#L86), which converts 8-bit RGB values into floats between 0 and 1 by dividing the input by 255. We can achieve the same effect in FINN by exporting a single-node ONNX graph for division by 255 (which already exists as `finn.util.pytorch.ToTensor` and merging this with our original model. Finally, we're going to mark our input tensor as 8-bit to let FINN know which level of precision to use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.util.pytorch import ToTensor\n",
    "from qonnx.transformation.merge_onnx_models import MergeONNXModels\n",
    "from qonnx.core.datatype import DataType\n",
    "\n",
    "model = ModelWrapper(build_dir+\"/tfc_w1_a1_tidy.onnx\")\n",
    "global_inp_name = model.graph.input[0].name\n",
    "ishape = model.get_tensor_shape(global_inp_name)\n",
    "# preprocessing: torchvision's ToTensor divides uint8 inputs by 255\n",
    "totensor_pyt = ToTensor()\n",
    "chkpt_preproc_name = build_dir+\"/tfc_w1_a1_preproc.onnx\"\n",
    "bo.export_finn_onnx(totensor_pyt, ishape, chkpt_preproc_name)\n",
    "\n",
    "# join preprocessing and core model\n",
    "pre_model = ModelWrapper(chkpt_preproc_name)\n",
    "model = model.transform(MergeONNXModels(pre_model))\n",
    "# add input quantization annotation: UINT8 for all BNN-PYNQ models\n",
    "global_inp_name = model.graph.input[0].name\n",
    "model.set_tensor_datatype(global_inp_name, DataType[\"UINT8\"])\n",
    "\n",
    "model.save(build_dir+\"/tfc_w1_a1_with_preproc.onnx\")\n",
    "showInNetron(build_dir+\"/tfc_w1_a1_with_preproc.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can observe two changes in the graph above: a `Div` node has appeared in the beginning to perform the input preprocessing, and the `global_in` tensor now has a quantization annotation to mark it as an unsigned 8-bit value.\n",
    "\n",
    "For the postprocessing we'll insert a TopK node for k=1 at the end of our graph. This will extract the index (class number) for the largest-valued output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qonnx.transformation.insert_topk import InsertTopK\n",
    "\n",
    "# postprocessing: insert Top-1 node at the end\n",
    "model = model.transform(InsertTopK(k=1))\n",
    "chkpt_name = build_dir+\"/tfc_w1_a1_pre_post.onnx\"\n",
    "# tidy-up again\n",
    "model = model.transform(InferShapes())\n",
    "model = model.transform(FoldConstants())\n",
    "model = model.transform(GiveUniqueNodeNames())\n",
    "model = model.transform(GiveReadableTensorNames())\n",
    "model = model.transform(InferDataTypes())\n",
    "model = model.transform(RemoveStaticGraphInputs())\n",
    "model.save(chkpt_name)\n",
    "\n",
    "showInNetron(build_dir+\"/tfc_w1_a1_pre_post.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice the`TopK` node that has appeared at the end of the network. With our pre- and postprocessing in place, we can move on to the next step in the flow, which is streamlining."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Streamlining <a id='streamline'></a>\n",
    "Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multi-thresholding nodes. For more information on the theoretical background of this, see [this paper](https://arxiv.org/pdf/1709.04060).\n",
    "\n",
    "Let's have a look at which sub-transformations `Streamline` consists of:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.transformation.streamline import Streamline\n",
    "showSrc(Streamline)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/main/src/finn/transformation/streamline). After each transformation, three of the tidy-up transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model.\n",
    "\n",
    "After streamlining the network looks as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.transformation.streamline.reorder import MoveScalarLinearPastInvariants\n",
    "import finn.transformation.streamline.absorb as absorb\n",
    "\n",
    "model = ModelWrapper(build_dir+\"/tfc_w1_a1_pre_post.onnx\")\n",
    "# move initial Mul (from preproc) past the Reshape\n",
    "model = model.transform(MoveScalarLinearPastInvariants())\n",
    "# streamline\n",
    "model = model.transform(Streamline())\n",
    "model.save(build_dir+\"/tfc_w1_a1_streamlined.onnx\")\n",
    "showInNetron(build_dir+\"/tfc_w1_a1_streamlined.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that the network has become simplified considerably compared to the previous step -- a lot of nodes have disappeared between the `MatMul` layers. \n",
    "\n",
    "**The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.**\n",
    "\n",
    "Our example network is a quantized network with 1-bit bipolar (-1, +1 values) precision, and we want FINN to implement them as XNOR-popcount operations [as described in the original FINN paper](https://arxiv.org/pdf/1612.07119). For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount\n",
    "from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds\n",
    "from qonnx.transformation.infer_data_layouts import InferDataLayouts\n",
    "from qonnx.transformation.general import RemoveUnusedTensors\n",
    "\n",
    "model = model.transform(ConvertBipolarMatMulToXnorPopcount())\n",
    "model = model.transform(absorb.AbsorbAddIntoMultiThreshold())\n",
    "model = model.transform(absorb.AbsorbMulIntoMultiThreshold())\n",
    "# absorb final add-mul nodes into TopK\n",
    "model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())\n",
    "model = model.transform(RoundAndClipThresholds())\n",
    "\n",
    "# bit of tidy-up\n",
    "model = model.transform(InferDataLayouts())\n",
    "model = model.transform(RemoveUnusedTensors())\n",
    "\n",
    "model.save(build_dir+\"/tfc_w1a1_ready_for_hls_conversion.onnx\")\n",
    "showInNetron(build_dir+\"/tfc_w1a1_ready_for_hls_conversion.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Observe the pairs of `XnorPopcountmatMul` and `MultiThreshold` layers following each other -- this is the particular pattern that the next step will be looking for in order to convert them to HLS layers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conversion to HLS layers <a id='hls_layers'></a>\n",
    "Converts the nodes to HLS layers that correspond to the functions in [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/). In our case this transformation converts pairs of binary XnorPopcountMatMul layers to MatrixVectorActivation layers. Any immediately following MultiThreshold layers will also be absorbed into the MVTU.\n",
    "\n",
    "Below is the code for the transformation and the network is visualized using netron to create the new structure with `MatrixVectorActivation` nodes, which will correspond to a function call from the [finn-hlslib](https://finn-hlslib.readthedocs.io/en/latest/library/matrixvector.html) library."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** The transformation `to_hls.InferBinaryMatrixVectorActivation` gets the string \"decoupled\" as argument, this indicates the `mem_mode` for the weights. In FINN there are different options to set the way the weights are stored and accessed. For details please have a look on the [FINN readthedocs website](https://finn.readthedocs.io/) under Internals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls\n",
    "model = ModelWrapper(build_dir+\"/tfc_w1a1_ready_for_hls_conversion.onnx\")\n",
    "model = model.transform(to_hls.InferBinaryMatrixVectorActivation(\"decoupled\"))\n",
    "# TopK to LabelSelect\n",
    "model = model.transform(to_hls.InferLabelSelectLayer())\n",
    "# input quantization (if any) to standalone thresholding\n",
    "model = model.transform(to_hls.InferThresholdingLayer())\n",
    "model.save(build_dir+\"/tfc_w1_a1_hls_layers.onnx\")\n",
    "showInNetron(build_dir+\"/tfc_w1_a1_hls_layers.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each MatrixVectorActivation node has two attributes that specify the degree of folding, PE and SIMD. In all nodes the values for these attributes are set as default to 1, which would correspond to a maximum folding (time multiplexing) and thus minimum performance. We will shortly cover how these can be adjusted, but first we want to separate the HLS layers from the non-HLS layers in this network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a Dataflow Partition <a id='dataflow_partition'></a>\n",
    "\n",
    "In the graph above, you can see that there is a mixture of FINN HLS layers (MatrixVectorActivation and Thresholding_Batch) with one regular ONNX layers (Reshape). To create a bitstream, FINN needs a model with only HLS layers. In order to achieve this, we will use the `CreateDataflowPartition` transformation to create a \"dataflow partition\" in this graph, separating out the HLS layers into another model, and replacing them with a placeholder layer called StreamingDataflowPartition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.transformation.fpgadataflow.create_dataflow_partition import CreateDataflowPartition\n",
    "\n",
    "model = ModelWrapper(build_dir+\"/tfc_w1_a1_hls_layers.onnx\")\n",
    "parent_model = model.transform(CreateDataflowPartition())\n",
    "parent_model.save(build_dir+\"/tfc_w1_a1_dataflow_parent.onnx\")\n",
    "showInNetron(build_dir+\"/tfc_w1_a1_dataflow_parent.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the `MatrixVectorActivation` instances and the `Thresholding_Batch` in the beginning have all been replaced with a single `StreamingDataflowPartition`, which has an attribute `model` that points to the extracted, HLS dataflow-only graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qonnx.custom_op.registry import getCustomOp\n",
    "sdp_node = parent_model.get_nodes_by_op_type(\"StreamingDataflowPartition\")[0]\n",
    "sdp_node = getCustomOp(sdp_node)\n",
    "dataflow_model_filename = sdp_node.get_nodeattr(\"model\")\n",
    "showInNetron(dataflow_model_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see all the extracted `MatrixVectorActivation` instances and the `Thresholding_Batch` have been moved to the child (dataflow) model. We will load the child model with `ModelWrapper` and continue working on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ModelWrapper(dataflow_model_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Folding: Adjusting the Parallelism <a id='folding'></a>\n",
    "\n",
    "*Folding* in FINN describes how much a layer is time-multiplexed in terms of execution resources. There are several *folding factors* for each layer, controlled by the PE (parallelization over outputs) and SIMD (parallelization over inputs) parameters as described by the original [FINN paper](https://arxiv.org/pdf/1612.07119). The higher the PE and SIMD values are set, the faster the generated accelerator will run, and the more FPGA resources it will consume. \n",
    "\n",
    "Since the folding parameters are node attributes, they can be easily accessed and changed using a helper function of the `ModelWrapper`. But first we take a closer look at one of the nodes that implement a MatrixVectorActivation operation. This is where the Netron visualization helps us, in the above diagram we can see that the model contains four MatrixVectorActivation. So as an example we extract the second node of the graph."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the higher-level [HLSCustomOp](https://github.com/Xilinx/finn/blob/main/src/finn/custom_op/fpgadataflow/hlscustomop.py) wrappers for this node. These wrappers provide easy access to specific properties of these nodes, such as the folding factors (PE and SIMD). Let's have a look at which node attributes are defined by the CustomOp wrapper, and adjust the SIMD and PE attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fc0 = model.graph.node[1]\n",
    "fc0w = getCustomOp(fc0)\n",
    "\n",
    "print(\"CustomOp wrapper is of class \" + fc0w.__class__.__name__)\n",
    "\n",
    "fc0w.get_nodeattr_types()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the PE and SIMD are listed as node attributes, as well as the depths of the FIFOs that will be inserted between consecutive layers, and all can be adjusted using `set_nodeattr` subject to certain constraints. There are also a lot of additional attributes that can be set for this node type.\n",
    "**In this notebook we are setting the folding factors and FIFO depths manually but it is possible to use FINN transformations for this ([SetFolding](https://github.com/Xilinx/finn/blob/main/src/finn/transformation/fpgadataflow/set_folding.py) and [InsertAndSetFIFODepths](https://github.com/Xilinx/finn/blob/main/src/finn/transformation/fpgadataflow/set_fifo_depths.py)).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fc_layers = model.get_nodes_by_op_type(\"MatrixVectorActivation\")\n",
    "# (PE, SIMD, in_fifo_depth, out_fifo_depth, ramstyle) for each layer\n",
    "config = [\n",
    "    (16, 49, [16], [64], \"block\"),\n",
    "    (8, 8, [64], [64], \"auto\"),\n",
    "    (8, 8, [64], [64], \"auto\"),\n",
    "    (10, 8, [64], [10], \"distributed\"),\n",
    "]\n",
    "for fcl, (pe, simd, ififo, ofifo, ramstyle) in zip(fc_layers, config):\n",
    "    fcl_inst = getCustomOp(fcl)\n",
    "    fcl_inst.set_nodeattr(\"PE\", pe)\n",
    "    fcl_inst.set_nodeattr(\"SIMD\", simd)\n",
    "    fcl_inst.set_nodeattr(\"inFIFODepths\", ififo)\n",
    "    fcl_inst.set_nodeattr(\"outFIFODepths\", ofifo)\n",
    "    fcl_inst.set_nodeattr(\"ram_style\", ramstyle)\n",
    "    \n",
    "# set parallelism for input quantizer to be same as first layer's SIMD\n",
    "inp_qnt_node = model.get_nodes_by_op_type(\"Thresholding_Batch\")[0]\n",
    "inp_qnt = getCustomOp(inp_qnt_node)\n",
    "inp_qnt.set_nodeattr(\"PE\", 49)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are setting PE and SIMD so that each layer has a total folding of 16."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides PE and SIMD three other node attributes are set. `ram_style` specifies how the weights are to be stored (BRAM, LUTRAM, and so on). It can be selected explicitly or with the option `auto` you can let Vivado decide.\n",
    "`inFIFODepths` and `outFIFODepths` specifies the FIFO depths that is needed by the node from the surrounding FIFOs. These attributes are used in the transformation 'InsertFIFO' to insert the appropriate FIFOs between the nodes, which will be automatically called as part of the hardware build process.\n",
    "\n",
    "In previous versions of FINN we had to call transformations to insert data width converters, FIFOs and `TLastMarker` manually at this step. This is no longer needed, as all this is taken care of by the `ZynqBuild` or `VitisBuild` transformations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.save(build_dir+\"/tfc_w1_a1_set_folding_factors.onnx\")\n",
    "showInNetron(build_dir+\"/tfc_w1_a1_set_folding_factors.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This completes the network preparation and the network can be passed on to the next block *Vitis HLS and IPI*, which is described below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Hardware Build <a id='vivado'></a>\n",
    "\n",
    "We're finally ready to start generating hardware from our network. Depending on whether you want to target a Zynq or Alveo platform, FINN offers two transformations to build the accelerator, integrate into an appropriate shell and build a bitfile. These are `ZynqBuild` and `VitisBuild` for Zynq and Alveo, respectively. In this notebook we'll demonstrate the `ZynqBuild` as these boards are more common and it's much faster to complete bitfile generation for the smaller FPGAs found on them.\n",
    "\n",
    "As we will be dealing with FPGA synthesis tools in these tasks, we'll define two helper variables that describe the Xilinx FPGA part name and the PYNQ board name that we are targeting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the names of the supported PYNQ boards\n",
    "from finn.util.basic import pynq_part_map\n",
    "print(pynq_part_map.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# change this if you have a different PYNQ board, see list above\n",
    "pynq_board = \"Pynq-Z1\"\n",
    "fpga_part = pynq_part_map[pynq_board]\n",
    "target_clk_ns = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In previous versions of FINN, we had to manually go through several steps to generate HLS code, stitch IP, create a PYNQ project and run synthesis. All these steps are now performed by the `ZynqBuild` transform (or the `VitisBuild` transform for Alveo). **As this involves calling HLS synthesis and Vivado synthesis, this transformation will run for some time (up to half an hour depending on your PC).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.transformation.fpgadataflow.make_zynq_proj import ZynqBuild\n",
    "model = ModelWrapper(build_dir+\"/tfc_w1_a1_set_folding_factors.onnx\")\n",
    "model = model.transform(ZynqBuild(platform = pynq_board, period_ns = target_clk_ns))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the `ZynqBuild` we run one additional transformation to generate a PYNQ driver for the accelerator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver\n",
    "model = model.transform(MakePYNQDriver(\"zynq-iodma\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.save(build_dir + \"/tfc_w1_a1_post_synthesis.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examining the generated outputs <a id='gen_outputs'></a>\n",
    "\n",
    "Let's start by viewing the post-synthesis model in Netron:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "showInNetron(build_dir + \"/tfc_w1_a1_post_synthesis.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that our sequence of HLS layers has been replaced with `StreamingDataflowPartition`s, each of which point to a different ONNX file. You can open a Netron session for each of them to view their contents. Here, the first and last partitions contain only an `IODMA` node, which was inserted automatically to move data between DRAM and the accelerator. Let's take a closer look at the middle partition, which contains all our layers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ModelWrapper(build_dir + \"/tfc_w1_a1_post_synthesis.onnx\")\n",
    "sdp_node_middle = getCustomOp(model.graph.node[1])\n",
    "postsynth_layers = sdp_node_middle.get_nodeattr(\"model\")\n",
    "\n",
    "showInNetron(postsynth_layers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that `StreamingFIFO` and `StreamingDataWidthConverter` instances have been automatically inserted into the graph prior to hardware build. Transformations like `ZynqBuild` use the `metadata_props` of the model to put in additional metadata information relevant to the results of the transformation. Let's examine the metadata for the current graph containing all layers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ModelWrapper(postsynth_layers)\n",
    "model.model.metadata_props"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we see that a Vivado project was built to create what we call the `stitched IP`, where all the IP blocks implementing various layers will be stitched together. You can view this stitched block design in Vivado, or [here](StreamingDataflowPartition_1.pdf) as an exported PDF."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Moving back to the top-level model, recall that `ZynqBuild` will create a Vivado project and synthesize it, so it will be creating metadata entries related to the paths and files that were created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ModelWrapper(build_dir + \"/tfc_w1_a1_post_synthesis.onnx\")\n",
    "model.model.metadata_props"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we can see the directories that were created for the PYNQ driver (`pynq_driver_dir`) and the Vivado synthesis project (`vivado_pynq_proj`), as well as the locations of the bitfile, hardware handoff file and synthesis report."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! ls {model.get_metadata_prop(\"vivado_pynq_proj\")}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Feel free to examine the generated Vivado project to get a feel for how the system-level integration is performed for the  FINN-generated \"stitched IP\", which appears as `StreamingDataflowPartition_1` in the top-level block design -- you can see it as a block diagram exported to PDF [here](top.pdf).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.  PYNQ deployment <a id='hw_test'></a>\n",
    "\n",
    "* [Deployment](#deploy)\n",
    "* [Validation on PYNQ Board](#validation)\n",
    "* [Throughput Test on PYNQ Board](#throughput)\n",
    "\n",
    "\n",
    "The bitfile and generated driver will be copied together with some necessary files for execution into a deployment folder which then can be used to run the network on the PYNQ board."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deployment <a id='deploy'></a>\n",
    "\n",
    "We'll now create a deployment folder with the bitfile and driver file(s), we zip it and afterwards it can be copied to the PYNQ board for execution and validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from shutil import copy\n",
    "from distutils.dir_util import copy_tree\n",
    "\n",
    "# create directory for deployment files\n",
    "deployment_dir = make_build_dir(prefix=\"pynq_deployment_\")\n",
    "model.set_metadata_prop(\"pynq_deployment_dir\", deployment_dir)\n",
    "\n",
    "# get and copy necessary files\n",
    "# .bit and .hwh file\n",
    "bitfile = model.get_metadata_prop(\"bitfile\")\n",
    "hwh_file = model.get_metadata_prop(\"hw_handoff\")\n",
    "deploy_files = [bitfile, hwh_file]\n",
    "\n",
    "for dfile in deploy_files:\n",
    "    if dfile is not None:\n",
    "        copy(dfile, deployment_dir)\n",
    "\n",
    "# driver.py and python libraries\n",
    "pynq_driver_dir = model.get_metadata_prop(\"pynq_driver_dir\")\n",
    "copy_tree(pynq_driver_dir, deployment_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next to these files, we will also need an example numpy array to test the network on the PYNQ board. You may recall that one \"reshape\" node was left out of the StreamingDataflowPartition. We'll do that manually with a numpy function call when passing in the input, but everything else in the network ended up inside the StreamingDataflowPartition so that's all we need to do. The example numpy array can then be saved as .npy file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pkgutil import get_data\n",
    "import onnx.numpy_helper as nph\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "raw_i = get_data(\"qonnx.data\", \"onnx/mnist-conv/test_data_set_0/input_0.pb\")\n",
    "x = nph.to_array(onnx.load_tensor_from_string(raw_i))\n",
    "plt.imshow(x.reshape(28,28), cmap='gray')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "model = ModelWrapper(build_dir + \"/tfc_w1_a1_post_synthesis.onnx\")\n",
    "iname = model.graph.input[0].name\n",
    "oname = parent_model.graph.output[0].name\n",
    "ishape = model.get_tensor_shape(iname)\n",
    "print(\"Expected network input shape is \" + str(ishape))\n",
    "np.save(deployment_dir + \"/input.npy\", x.reshape(ishape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! ls {deployment_dir}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from shutil import make_archive\n",
    "make_archive('deploy-on-pynq-tfc', 'zip', deployment_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can now download the created zipfile (**File -> Open**, mark the checkbox next to the `deploy-on-pynq-tfc.zip` and select Download from the toolbar), then copy it to your PYNQ board (for instance via `scp` or `rsync`). Then, run the following commands **on the PYNQ board** to extract the archive and run the execution:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```shell\n",
    "unzip deploy-on-pynq-tfc.zip -d finn-tfc-demo\n",
    "cd finn-tfc-demo\n",
    "sudo python3 -m pip install bitstring\n",
    "sudo python3 driver.py --exec_mode=execute --batchsize=1 --bitfile=resizer.bit --inputfile=input.npy\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output will be saved on the PYNQ board as `output.npy` and can be copied to the host and opened with `np.load()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Validating the Accuracy on a PYNQ Board <a id='validation'></a>\n",
    "\n",
    "**Ensure that your PYNQ board has a working internet connecting for the next steps, since there is some downloading involved.**\n",
    "\n",
    "To validate the accuracy, we first need to install the [`dataset-loading`](https://github.com/fbcotter/dataset_loading) Python package to the PYNQ board. This will give us a convenient way of downloading and accessing the MNIST dataset.\n",
    "\n",
    "\n",
    "Command to execute on PYNQ board:\n",
    "\n",
    "```shell\n",
    "sudo pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now use the `validate.py` script that was generated together with the driver to measure top-1 accuracy on the MNIST dataset.\n",
    "\n",
    "Command to execute on PYNQ board:\n",
    "\n",
    "```shell\n",
    "sudo python3 validate.py --dataset mnist --batchsize 1000\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the final top-1 accuracy is 92.96%, which is very close to the 93.17% reported on the [BNN-PYNQ accuracy table in Brevitas](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Throughput Test on PYNQ Board <a id='throughput'></a>\n",
    "In addition to the functional verification, FINN also offers the possibility to measure the network performance directly on the PYNQ board. This can be done setting the `exec_mode` to `throughput_test`. \n",
    "Command to execute on PYNQ board:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```shell\n",
    "sudo python3 driver.py --exec_mode=throughput_test --batchsize=1000 --bitfile=resizer.bit\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The network metrics from the throughput test are saved in a file called `nw_metrics.txt` on the PYNQ board. Which can be investigated after running the command above."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
